Introduction

This is our R Notebook, showing the steps we took to complete the Final Project for CS 329E. This notebook includes step-by-step instructions on how to reproduce our project. To obtain our data, we used data.world.

R Configuration

Below we display our sessionInfo().

sessionInfo(package=NULL)
## R version 3.2.4 Revised (2016-03-16 r70336)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.2.4    
##  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.10    stringi_1.1.2  
##  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.1.0   digest_0.6.12  
## [13] evaluate_0.10

Data Description

The data was found on “Dr. John Rasp’s Statistics Website” (http://www2.stetson.edu/~jrasp/data.htm). It is a subset of the data from College Scorecard, a Department of Education website that gives data on various variables regarding tuition, costs and school performance.

An explanatory key for the recorded variables can be found here: https://data.world/jlee/s-17-dv-final-project/file/CollegeScorecard_ColumnNames.pdf

Cleaning Data

Here’s our ETL file to clean our data set.

source("../01 Data/R_ETL.CollegeScorecard.R")
## Loading required package: readr
## Warning: package 'readr' was built under R version 3.2.5
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 3.2.5
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   UNITID = col_integer(),
##   CONTROL = col_integer(),
##   CCBASIC = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 421 parsing failures.
##  row     col   expected actual                                     file
## 7283 CCBASIC an integer   NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7284 CCBASIC an integer   NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7285 CCBASIC an integer   NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7286 CCBASIC an integer   NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7287 CCBASIC an integer   NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## .... ....... .......... ...... ........................................
## See problems(...) for more details.
## Classes 'tbl_df', 'tbl' and 'data.frame':    7703 obs. of  30 variables:
##  $ UNITID     : int  100654 100663 100690 100706 100724 100751 100760 100812 100830 100858 ...
##  $ INSTNM     : chr  "Alabama A & M University" "University of Alabama at Birmingham" "Amridge University" "University of Alabama in Huntsville" ...
##  $ CITY       : chr  "Normal" "Birmingham" "Montgomery" "Huntsville" ...
##  $ STABBR     : chr  "AL" "AL" "AL" "AL" ...
##  $ CONTROL    : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ CCBASIC    : int  18 15 20 16 19 16 1 22 18 16 ...
##  $ ADM_RATE   : chr  "0.5256" "0.8569" "NULL" "0.8203" ...
##  $ SAT_AVG    : chr  "827" "1107" "NULL" "1219" ...
##  $ UGDS       : chr  "4206" "11383" "291" "5451" ...
##  $ UGDS_WHITE : chr  "0.0333" "0.5922" "0.299" "0.6988" ...
##  $ UGDS_BLACK : chr  "0.9353" "0.26" "0.4192" "0.1255" ...
##  $ UGDS_HISP  : chr  "0.0055" "0.0283" "0.0069" "0.0382" ...
##  $ UGDS_ASIAN : chr  "0.0019" "0.0518" "0.0034" "0.0376" ...
##  $ UGDS_AIAN  : chr  "0.0024" "0.0022" "0" "0.0143" ...
##  $ UGDS_NHPI  : chr  "0.0019" "0.0007" "0" "0.0002" ...
##  $ UGDS_2MOR  : chr  "0" "0.0368" "0" "0.0172" ...
##  $ UGDS_NRA   : chr  "0.0059" "0.0179" "0" "0.0332" ...
##  $ UGDS_UNKN  : chr  "0.0138" "0.01" "0.2715" "0.035" ...
##  $ PPTUG_EF   : chr  "0.0656" "0.2607" "0.4536" "0.2146" ...
##  $ NPT4_PUB   : chr  "15229" "14789" "NULL" "18596" ...
##  $ NPT4_PRIV  : chr  "NULL" "NULL" "12992" "NULL" ...
##  $ COSTT4_A   : chr  "21475" "20621" "16370" "21107" ...
##  $ TUITFTE    : chr  "9427" "9899" "12459" "8956" ...
##  $ INEXPFTE   : chr  "7437" "17920" "5532" "10211" ...
##  $ PFTFAC     : chr  "0.8967" "0.9072" "0.6" "0.6221" ...
##  $ PCTPELL    : chr  "0.7356" "0.346" "0.6801" "0.3072" ...
##  $ C150_4     : chr  "0.3525" "0.5554" "0.2222" "0.4614" ...
##  $ PFTFTUG1_EF: chr  "0.8578" "0.5041" "0.5" "0.475" ...
##  $ RET_FT4    : chr  "0.6595" "0.8288" "0" "0.7696" ...
##  $ PCTFLOAN   : chr  "0.8284" "0.5214" "0.7795" "0.4596" ...
##  - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 421 obs. of  5 variables:
##   ..$ row     : int  7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 ...
##   ..$ col     : chr  "CCBASIC" "CCBASIC" "CCBASIC" "CCBASIC" ...
##   ..$ expected: chr  "an integer" "an integer" "an integer" "an integer" ...
##   ..$ actual  : chr  "NULL" "NULL" "NULL" "NULL" ...
##   ..$ file    : chr  "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 30
##   .. ..$ UNITID     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ INSTNM     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ CITY       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ STABBR     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ CONTROL    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ CCBASIC    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ ADM_RATE   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SAT_AVG    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_WHITE : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_BLACK : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_HISP  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_ASIAN : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_AIAN  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_NHPI  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_2MOR  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_NRA   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_UNKN  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PPTUG_EF   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ NPT4_PUB   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ NPT4_PRIV  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COSTT4_A   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ TUITFTE    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ INEXPFTE   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PFTFAC     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PCTPELL    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ C150_4     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PFTFTUG1_EF: list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ RET_FT4    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PCTFLOAN   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated

## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated

## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated

## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated

## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## [1] "ADM_RATE"
## [1] "SAT_AVG"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "UGDS"
## [1] "UGDS_WHITE"
## [1] "UGDS_BLACK"
## [1] "UGDS_HISP"
## [1] "UGDS_ASIAN"
## [1] "UGDS_AIAN"
## [1] "UGDS_NHPI"
## [1] "UGDS_2MOR"
## [1] "UGDS_NRA"
## [1] "UGDS_UNKN"
## [1] "PPTUG_EF"
## [1] "NPT4_PUB"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "NPT4_PRIV"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "COSTT4_A"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "TUITFTE"
## [1] "INEXPFTE"
## [1] "PFTFAC"
## [1] "PCTPELL"
## [1] "C150_4"
## [1] "PFTFTUG1_EF"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "RET_FT4"
## [1] "PCTFLOAN"
## Classes 'tbl_df', 'tbl' and 'data.frame':    7703 obs. of  30 variables:
##  $ UNITID     : Factor w/ 7703 levels "100654","100663",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ INSTNM     : Factor w/ 7535 levels "A  and  W Healthcare Educators",..: 95 6760 242 6761 99 6497 1136 391 408 407 ...
##  $ CITY       : Factor w/ 2542 levels "Aberdeen","Abilene",..: 1559 190 1432 1012 1432 2286 27 94 1432 98 ...
##  $ STABBR     : Factor w/ 59 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ CONTROL    : Factor w/ 3 levels "1","2","3": 1 1 2 1 1 1 1 1 1 1 ...
##  $ CCBASIC    : Factor w/ 34 levels "-2","1","10",..: 11 8 14 9 12 9 2 16 11 9 ...
##  $ ADM_RATE   : num  0.526 0.856 NA 0.82 0.533 ...
##  $ SAT_AVG    : num  827 1107 NA 1210 851 ...
##  $ UGDS       : num  4206 11383 201 5451 4811 ...
##  $ UGDS_WHITE : num  0.0333 0.5022 0.2 0.6088 0.0158 ...
##  $ UGDS_BLACK : num  0.0353 0.26 0.4102 0.1255 0.0208 ...
##  $ UGDS_HISP  : num  0.0055 0.0283 0.006 0.0382 0.0121 0.0348 0.0044 0.0101 0.0074 0.0248 ...
##  $ UGDS_ASIAN : num  0.001 0.0518 0.0034 0.0376 0.001 0.0106 0.0025 0.0053 0.0221 0.0227 ...
##  $ UGDS_AIAN  : num  0.0024 0.0022 0 0.0143 0.001 0.0038 0.0044 0.0157 0.0044 0.0074 ...
##  $ UGDS_NHPI  : num  0.001 0.0007 0 0.0002 0.0006 0 0 0.001 0.0016 0 ...
##  $ UGDS_2MOR  : num  0 0.0368 0 0.0172 0.0008 0.0261 0 0.0174 0.0207 0 ...
##  $ UGDS_NRA   : num  0.005 0.017 0 0.0332 0.0243 0.0268 0 0.0057 0.0307 0.01 ...
##  $ UGDS_UNKN  : num  0.0138 0.01 0.2715 0.035 0.0137 ...
##  $ PPTUG_EF   : num  0.0656 0.2607 0.4536 0.2146 0.0802 ...
##  $ NPT4_PUB   : num  15220 14780 NA 18506 11110 ...
##  $ NPT4_PRIV  : num  NA NA 12002 NA NA ...
##  $ COSTT4_A   : num  21475 20621 16370 21107 18184 ...
##  $ TUITFTE    : num  427 800 12450 8056 7733 ...
##  $ INEXPFTE   : num  7437 17020 5532 10211 7618 ...
##  $ PFTFAC     : num  0.8067 0.0072 0.6 0.6221 0.653 ...
##  $ PCTPELL    : num  0.736 0.346 0.68 0.307 0.735 ...
##  $ C150_4     : num  0.352 0.555 0.222 0.461 0.263 ...
##  $ PFTFTUG1_EF: num  0.858 0.504 0.5 0.475 0.881 ...
##  $ RET_FT4    : num  0.65 0.829 0 0.761 0.573 ...
##  $ PCTFLOAN   : num  0.828 0.521 0.77 0.451 0.755 ...
##  - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 421 obs. of  5 variables:
##   ..$ row     : int  7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 ...
##   ..$ col     : chr  "CCBASIC" "CCBASIC" "CCBASIC" "CCBASIC" ...
##   ..$ expected: chr  "an integer" "an integer" "an integer" "an integer" ...
##   ..$ actual  : chr  "NULL" "NULL" "NULL" "NULL" ...
##   ..$ file    : chr  "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 30
##   .. ..$ UNITID     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ INSTNM     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ CITY       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ STABBR     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ CONTROL    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ CCBASIC    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ ADM_RATE   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SAT_AVG    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_WHITE : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_BLACK : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_HISP  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_ASIAN : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_AIAN  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_NHPI  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_2MOR  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_NRA   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ UGDS_UNKN  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PPTUG_EF   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ NPT4_PUB   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ NPT4_PRIV  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COSTT4_A   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ TUITFTE    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ INEXPFTE   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PFTFAC     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PCTPELL    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ C150_4     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PFTFTUG1_EF: list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ RET_FT4    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PCTFLOAN   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"
## CREATE TABLE CSVsPreETL_CollegeScorecard (
## -- Change table_name to the table name you want.
##  UNITID varchar2(4000),
##  INSTNM varchar2(4000),
##  CITY varchar2(4000),
##  STABBR varchar2(4000),
##  CONTROL varchar2(4000),
##  CCBASIC varchar2(4000),
##  ADM_RATE number(38,4),
##  SAT_AVG number(38,4),
##  UGDS number(38,4),
##  UGDS_WHITE number(38,4),
##  UGDS_BLACK number(38,4),
##  UGDS_HISP number(38,4),
##  UGDS_ASIAN number(38,4),
##  UGDS_AIAN number(38,4),
##  UGDS_NHPI number(38,4),
##  UGDS_2MOR number(38,4),
##  UGDS_NRA number(38,4),
##  UGDS_UNKN number(38,4),
##  PPTUG_EF number(38,4),
##  NPT4_PUB number(38,4),
##  NPT4_PRIV number(38,4),
##  COSTT4_A number(38,4),
##  TUITFTE number(38,4),
##  INEXPFTE number(38,4),
##  PFTFAC number(38,4),
##  PCTPELL number(38,4),
##  C150_4 number(38,4),
##  PFTFTUG1_EF number(38,4),
##  RET_FT4 number(38,4),
##  PCTFLOAN number(38,4)
##  );

Cleaned up Data Download

Cleaned data can be downloaded from Data.world as a .csv file. Because the dataset is so large, we filtered to only show some rows.

Hosting User: jlee
Database: S17 DV Final Project
Dataset Name: CollegeScorecard.csv

Download Link: https://query.data.world/s/dv5dl8q1jx2qb3d3bd2976b9d

source("../01 Data/Accessdataworld.R")
## Loading required package: data.world
## Warning in utils::read.table(propsfile, header = FALSE, sep = "=",
## row.names = 1, : incomplete final line found by readTableHeader on 'C:/
## Users/Blanche/Documents/.data.world'

## Warning in utils::read.table(propsfile, header = FALSE, sep = "=",
## row.names = 1, : incomplete final line found by readTableHeader on 'C:/
## Users/Blanche/Documents/.data.world'
##      UNITID          INSTNM              CITY              STABBR         
##  Min.   :100654   Length:10          Length:10          Length:10         
##  1st Qu.:101989   Class :character   Class :character   Class :character  
##  Median :143487   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :126860                                                           
##  3rd Qu.:143536                                                           
##  Max.   :143613                                                           
##                                                                           
##     CONTROL       CCBASIC        ADM_RATE         SAT_AVG    
##  Min.   :1.0   Min.   :-2.0   Min.   :0.5256   Min.   : 827  
##  1st Qu.:1.0   1st Qu.:-2.0   1st Qu.:0.5293   1st Qu.: 967  
##  Median :2.5   Median : 2.0   Median :0.5330   Median :1107  
##  Mean   :2.1   Mean   : 5.9   Mean   :0.6382   Mean   :1038  
##  3rd Qu.:3.0   3rd Qu.:13.5   3rd Qu.:0.6945   3rd Qu.:1144  
##  Max.   :3.0   Max.   :21.0   Max.   :0.8560   Max.   :1181  
##                               NA's   :7        NA's   :7     
##       UGDS         UGDS_WHITE        UGDS_BLACK        UGDS_HISP     
##  Min.   :    2   Min.   :0.00000   Min.   :0.01550   Min.   :0.0055  
##  1st Qu.:   55   1st Qu.:0.01995   1st Qu.:0.04167   1st Qu.:0.0346  
##  Median :  650   Median :0.33715   Median :0.11515   Median :0.0677  
##  Mean   : 2373   Mean   :0.31798   Mean   :0.20395   Mean   :0.1203  
##  3rd Qu.: 3530   3rd Qu.:0.49350   3rd Qu.:0.25093   3rd Qu.:0.1918  
##  Max.   :11383   Max.   :0.77880   Max.   :0.74740   Max.   :0.3380  
##                                                                      
##    UGDS_ASIAN        UGDS_AIAN         UGDS_NHPI          UGDS_2MOR       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.00025   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.001275  
##  Median :0.01215   Median :0.00230   Median :0.000350   Median :0.011350  
##  Mean   :0.01906   Mean   :0.00769   Mean   :0.001820   Mean   :0.013410  
##  3rd Qu.:0.03320   3rd Qu.:0.00860   3rd Qu.:0.001225   3rd Qu.:0.019125  
##  Max.   :0.05180   Max.   :0.04510   Max.   :0.013200   Max.   :0.036800  
##                                                                           
##     UGDS_NRA         UGDS_UNKN           PPTUG_EF         NPT4_PUB    
##  Min.   :0.00000   Min.   :0.000000   Min.   :0.0017   Min.   : 1044  
##  1st Qu.:0.00000   1st Qu.:0.001275   1st Qu.:0.1144   1st Qu.: 6268  
##  Median :0.00000   Median :0.010000   Median :0.3100   Median :11395  
##  Mean   :0.00381   Mean   :0.020510   Mean   :0.3820   Mean   : 9764  
##  3rd Qu.:0.00375   3rd Qu.:0.026100   3rd Qu.:0.6693   3rd Qu.:14890  
##  Max.   :0.01700   Max.   :0.070200   Max.   :0.8526   Max.   :15220  
##                                                        NA's   :6      
##    NPT4_PRIV        COSTT4_A        TUITFTE         INEXPFTE    
##  Min.   : 4638   Min.   : 8557   Min.   :  423   Min.   :  535  
##  1st Qu.: 5808   1st Qu.:12818   1st Qu.: 1176   1st Qu.: 5636  
##  Median :10206   Median :20621   Median : 4249   Median : 6830  
##  Mean   :11958   Mean   :21727   Mean   : 5054   Mean   : 7262  
##  3rd Qu.:17634   3rd Qu.:21475   3rd Qu.: 8224   3rd Qu.: 7529  
##  Max.   :22246   Max.   :45162   Max.   :11878   Max.   :17020  
##  NA's   :4       NA's   :5                                      
##      PFTFAC          PCTPELL           C150_4        PFTFTUG1_EF    
##  Min.   :0.0072   Min.   :0.0200   Min.   :0.3525   Min.   :0.0230  
##  1st Qu.:0.2663   1st Qu.:0.3317   1st Qu.:0.4540   1st Qu.:0.1742  
##  Median :0.6410   Median :0.5780   Median :0.5554   Median :0.2027  
##  Mean   :0.5442   Mean   :0.4885   Mean   :0.5167   Mean   :0.3524  
##  3rd Qu.:0.8067   3rd Qu.:0.7081   3rd Qu.:0.5988   3rd Qu.:0.5041  
##  Max.   :1.0000   Max.   :0.8065   Max.   :0.6422   Max.   :0.8578  
##  NA's   :5                         NA's   :7        NA's   :5       
##     RET_FT4          PCTFLOAN     
##  Min.   :0.6505   Min.   :0.0000  
##  1st Qu.:0.7397   1st Qu.:0.1911  
##  Median :0.8288   Median :0.4432  
##  Mean   :0.7721   Mean   :0.4236  
##  3rd Qu.:0.8329   3rd Qu.:0.6591  
##  Max.   :0.8371   Max.   :0.8284  
##  NA's   :7                        
##     State             AreaName         Population 18 Years or Over
##  Length:52          Length:52          Min.   :  441946           
##  Class :character   Class :character   1st Qu.: 1350031           
##  Mode  :character   Mode  :character   Median : 3230085           
##                                        Mean   : 4723276           
##                                        3rd Qu.: 5328190           
##                                        Max.   :29247121           
##  Population 25 Years or Over
##  Min.   :  384272           
##  1st Qu.: 1169237           
##  Median : 2835094           
##  Mean   : 4113026           
##  3rd Qu.: 4638242           
##  Max.   :25257858           
##  Population 25 Years or Over with No Schooling Completed
##  Min.   :  1049                                         
##  1st Qu.: 11192                                         
##  Median : 32307                                         
##  Mean   : 58463                                         
##  3rd Qu.: 63731                                         
##  Max.   :652517                                         
##  Population 25 Years or Over with Bachelor's Degree
##  Min.   :  65787                                   
##  1st Qu.: 200538                                   
##  Median : 464927                                   
##  Mean   : 761284                                   
##  3rd Qu.:1002277                                   
##  Max.   :5002596                                   
##  Population 25 Years or Over with Master's Degree
##  Min.   :  23501                                 
##  1st Qu.:  84904                                 
##  Median : 199057                                 
##  Mean   : 325993                                 
##  3rd Qu.: 470375                                 
##  Max.   :1965005                                 
##  Population 25 Years or Over with Doctorate's Degree
##  Min.   :  4932                                     
##  1st Qu.: 21499                                     
##  Median : 48006                                     
##  Mean   : 80593                                     
##  3rd Qu.:103566                                     
##  Max.   :583790                                     
## # A tibble: 6 × 30
##   UNITID                                 INSTNM       CITY STABBR CONTROL
##    <int>                                  <chr>      <chr>  <chr>   <int>
## 1 100654           Alabama A  and  M University     Normal     AL       1
## 2 100663    University of Alabama at Birmingham Birmingham     AL       1
## 3 100937            Birmingham Southern College Birmingham     AL       2
## 4 105145              GateWay Community College    Phoenix     AZ       1
## 5 143482 Cannella School of Hair Design-Chicago    Chicago     IL       3
## 6 143491 Cannella School of Hair Design-Chicago    Chicago     IL       3
## # ... with 25 more variables: CCBASIC <int>, ADM_RATE <dbl>,
## #   SAT_AVG <int>, UGDS <int>, UGDS_WHITE <dbl>, UGDS_BLACK <dbl>,
## #   UGDS_HISP <dbl>, UGDS_ASIAN <dbl>, UGDS_AIAN <dbl>, UGDS_NHPI <dbl>,
## #   UGDS_2MOR <dbl>, UGDS_NRA <dbl>, UGDS_UNKN <dbl>, PPTUG_EF <dbl>,
## #   NPT4_PUB <int>, NPT4_PRIV <int>, COSTT4_A <int>, TUITFTE <int>,
## #   INEXPFTE <int>, PFTFAC <dbl>, PCTPELL <dbl>, C150_4 <dbl>,
## #   PFTFTUG1_EF <dbl>, RET_FT4 <dbl>, PCTFLOAN <dbl>
## # A tibble: 6 × 8
##   State  AreaName `Population 18 Years or Over`
##   <chr>     <chr>                         <int>
## 1    AL   Alabama                       3718646
## 2    AK    Alaska                        545248
## 3    LA Louisiana                       3510657
## 4    ME     Maine                       1066847
## 5    AZ   Arizona                       5024639
## 6    MD  Maryland                       4582258
## # ... with 5 more variables: `Population 25 Years or Over` <int>,
## #   `Population 25 Years or Over with No Schooling Completed` <int>,
## #   `Population 25 Years or Over with Bachelor's Degree` <int>,
## #   `Population 25 Years or Over with Master's Degree` <int>, `Population
## #   25 Years or Over with Doctorate's Degree` <int>

Side by Side Shiny and Tableau Visualizations

Descriptions: Refer to visualization captions for individual descriptions.

Dataset Column Names:

INSTNM - Institution Name; STABBR - State; CONTROL - 1 = Public. 2 = Private nonprofit. 3 = Private for-profit

Boxplot: Average Cost of Attendance for Type of School

These boxplots (Tableau left, Shiny right) demonstrate the average cost of attendance per type of school.

Histogram: SAT Averages for Universities

These histograms (Tableau left, Shiny right) demonstrate the SAT averages for universities across the United States.

Scatterplot: Instructional Expenditures vs. Net tuition

These scatterplots (Tableau left, Shiny right) explore the correlation between Instructional expenditures per full-time equivalent student and Net tuition revenue per full-time equivalent student.

Crosstab 1: Instructional Expenditures / Cost of Attendance

These crosstabs (Tableau left, Shiny right) demonstrate the ratio of instructional expenses to the average cost of attendance. They are labeled by the average cost of attendance. The red tile indicates a high ratio. The green tile indicates a medium ratio, and the blue tile indicates a low ratio. From the crosstab, one can see that public schools usually have a higher ratio, while private non-profit schools usually have a medium ratio. Private schools mostly have a medium to low ratio with the exception of some high ratios in four states.

Crosstab 2: Tuition Revenue / Total Cost

These crosstabs (Tableau left, Shiny right) demonstratae the ratio of the net tuition revenue per full-time student to the average cost of attendance. The red tile indicates a high ratio. The green tile indicates a medium ratio, and the blue tile indicates a low ratio. From the crosstab, one can see that public schools usually a medium ratio, while private non-profit schools usally have a medium to high ratio. Private schools mostly have a high ratio with the exception of some low and medium ratios some states.

Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)

These maps (Tableau left, Shiny right) demonstrate the distribution of instructional expenditure / cost of attendance ratio across the United States, where darker colors indicate higher ratios.



Map 2: Tuition Revenue to Total Cost

These maps (Tableau left, Shiny right) demonstrate the distribution of tuition revenue / total cost ratio across the United States, where darker colors indicate higher ratios.

Barchart: Instructional Expense per Type of Instutition


This barchart + table calculations (Tableau left, Shiny right) display the sum of instructional expenses across each control (public, private non-profit, and private for profit) for each state. The line shows the average of the sum of instructional expenses. This ID Sets on a map for barcharts has two sets: High Net Price and Medium Net Price for public schools. Net price is the actual amount families pay on average. The dots represent schools in the High Net Price.

Shiny Visualization and Published Application

Description: Full size static .pngs of the Shiny application, as well as a link to the live published version.

Published Link:
https://ehjkim.shinyapps.io/shinyfinal/

Boxplot: Average Cost of Attendance for Type of School




Histogram: SAT Averages for Universities




Scatterplot: Instructional Expenditures vs. Net tuition




Crosstab 1: Instructional Expenditures/Cost of Attendance




Crosstab 2: Tuition Revenue / Total Cost




Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)




Map 2: Tuition Revenue to Total Cost




Barchart: Instructional Expense per Type of Instutition




Tableau / Tableau Action Generated Visualizations

Descriptions: Full size static .pngs of the tableau visualizations. Refer to visualization captions for individual descriptions.

Boxplot: Average Cost of Attendance for Type of School




Histogram: SAT Averages for Universities




Scatterplot: Instructional Expenditures vs. Net tuition




Crosstab 1: Instructional Expenditures/Cost of Attendance




Crosstab 2: Tuition Revenue / Total Cost




Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)




Map 2: Tuition Revenue to Total Cost




Barchart: Instructional Expense per Type of Instutition